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1, INTRODUCTION 

In computer vision with the real-time visual object, Tracking remains a critical challenge. The 
challenges arise due to noise in the image, reflection background, computational problem in dynamic object 
motion, clutter based background, illumination changes in the static and dynamic image, partial or full 
occlusions which occur in real time processing etc [1, 2]. In the real world the images will be in 3D, when 
which is projected in 2D there will be a big loss of data. 

Tracking is the process of monitoring the object frame by frames, from its first manifestation to its 
end destination. The type of target and its characteristics description within the system depends on the 
application. Robustness and efficiency tracking are the two main challenges existing in trackers. Most of the 
robust trackers are implemented with single or combination features with a high computational cost. 

The enhancement of the tracker robustness using multi-view model with discriminative parameters. 
As well as effective tracker should be handling with variations of the object and its background. The 
Generative and discriminative are two approaches used in tracking of single or multi-objects in computer 
vision. The Generative approach uses a model based and classical tracking methods to track moving object. 
Discriminative object tracking algorithms are mainly established on deep learning methods. The main 
drawback of this algorithm is that it may require a larger number of training dataset. 

In this review paper, two directional aspects are there, one aspect is to review on, how the robust 
tracker's interposition with several challenges. The other is to tabulate which are the best trackers to achieve 
real-time tracking. 


1.1. Challenges in Visual Tracking 
1.1.1. The problems in visual tracking 

The real context of target tracking system, based on the basement of the three key glitches. 
Robustness: under the frigid conditions, the tracker algorithms should be able to track the target. The 
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tracking glitches may be cluttered background, partial and full changing illuminations, occlusions origin 
object motion. 

Mutation: The target itself undergoes changes with different environment in each frame. It requires a stable 
mechanism for tracking system to the actual object appearance. 

Implementation: Fast and optimized robust algorithms are required for the frame-rate, which establish 
smooth video out-put. 


1.1.2. Visual tracker 
With the specific rigid boundary and transient conditions, the objects a monitor in real time frames. 


The real time tracker will automatically interpret on the object to monitor and collects the information of the 
context specified above. The Figure | shows the flow diagram of a visual object tracker. 


Figure |. Flow diagram of a visual object tracker 





a) Object detection 

The object detection can be done in two approaches (1) temporal difference: This method is to 
subtract the two consecutive frame by frame with a set of the threshold. (2) Background subtraction method: 
This is to subtract the fore and background or reference model images. The morphologic method is applied to 
the above approaches to remove the noise in the image. 

b) Object tracking 

There are two main approaches are used to track the real-time object, one is 2-D model approach 
another is a 3-D model. The 2-D model track the object by using rectangular model, U-shape model, which 
consist of an image acquisition module and process the coordinate for single and multiple target trackers. 3-D 
geometrical model and model-based approach use explicit a priori geometrical knowledge of the object to 
surveillance for different applications. 

Once the model is fixed with varying context such as illumination, occlusions collisions (self). The 
model-based approaches [4]. Most of the tracking model uses filtering mechanism to detect each movement 
of recognized object [4-6]. 

Extended Kalman filters (EKF) or particle filters have been also proposed [5, 7]. HMMs (hidden 
Markov models) predict and track objects trajectories [8]. 

c) Behavioral analysis 

The final phase of video surveillance system is to monitor the activity and behavior of the target. 
The time-varying feature data will give the information of the next stage, which it contains pre-compiled 
measuring sequence library to label the training dataset also called as Deep-leaning model. 


2. REVIEW STRUCTURE 

The above flow diagram shows, the review structure, and its real-time challenges. All the tracking 
system will either need two type of input it may be a static image or the dynamic input. The context of the 
different model show depends on the real-time object appear in the scene. 


a) Sec-A. Appearance model 

Yang Hua et al. [9] in this paper the appearance model of the ROI object is computed by using HOG 
feature. In this model, algorithm use bounding box with linear SVM data set for learning and detection of 
tracking object. The estimating the location of the object with set of positive samples with the bounding box 
for the first frame and negative bounding box samples automatically. The Figure 3 shows the results of SVM 
model. 

Meijuan Bail et al. [10] in this author use, two type of algorithm one is a miulti-feature 
representation (MFR) and classifier-learning model (CLM). Model extract the intensity and pattern 
feature 71% is the pattern based which are unstable with environment or object's pattern changes. 
Figure 4 shows the author proposed compressive tracking algorithm. Heng Fan, Jinhai Xiang et al [11], in 
this paper MJDL (multitask joint dictionary learning) model is used on the target object for extract the 
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modality feature of corresponding discriminative dictionary. Yong Wang et al. [12] in this paper they 
proposed a different tracker such as Tracking (IVT), L1 tracking (LIT), LI-APG tracking, multi-task 
tracking (MTT-LOI, MTT-L21), Multiple Instance Learning tracking (MIL), compressive tracking (CT), 
Wacvl2, WMIL, LSST, L2-RLS which analyze the 22 video sequences and compared with the seven state- 
of-the-art tracking, as shown in Figure 5. 





Figure 2. Flow diagram of review structure 
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Figure 5. Author Proposed compressive tracking algorithm 
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Jianghu Lu et al. [13] in this paper the author used the efficient model called CT (compressive 
Tracking) which track the target and detect. In generative and discriminative model use the compressive 
domain for the extraction of appearance features. CS sensing reduce adaptive dimension with multi-scale 
features. 

Junseok Kwon et al. [19] in this paper author used different trackers such as VTS, MIL, MIT and 
ATS for the successful track of target under appearance context. The VTS model is used to track object. ATS 
model concentrate on degree of varying object. Boris et al. [36] in this paper the author uses the multiple 
instances learning algorithm MILT, which the set of image patches can be updated with appearance model. 


b) Sec-B. Illumination model 

ArvindNayak et al. [14] in this paper author uses the auto Correction Scheme that always 
transforms the image under some unknown illumination to match with the know illumination model. The 
correction scheme is tested on color and gray level imaging. 

SarehShirazi et al. [16] in this paper, the author proposed the Adaptive tracking model which it 
continuously updates set of affine subspace, and each subspace builds from the object appearance over 
several consecutive frames. In the new frame, propose a candidate image area for locating object, by 
including immediate tracking history of other frames. The non-Euclidean geometry of Grassmann manifolds 
is used between affine subspace from the object model and candidate area to obtain the data. 

Junseok Kwon et al. [17] in this paper the author illustrates the tracking of a target with 
illumination changes using WLMC and OIF model that track abrupt appearance of the object. Figure 6 and 7 
shows the results of the proposed tracker. 





Figure 7. Author Proposed WLMC and OIF model as tracking algorithm 


HaoyuRen et al. [18], in this paper author propose model called co-occurrence features based on 
Haar, LBP, and HOG for the appear detection of the target. It also uses the booster detector which gives high 
accuracy with computed efficiently. GEB framework is used for discriminative ability, generalization power. 

Junseok Kwon et al. [19], in this paper the author novelized the tracking object in each frame with 
MUG instead of a MAP. The drift problem caused by the noise target can be minimized by conventional 
MAP-based model. Figure 8 Shows the test results of standard dataset with MUG. 
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{c} face sequence (d) david sequence 


Figure 8. Author Proposed MAP-based as model tracking algorithm 


Zhiyong Li, Song Gao and KeNai et al. [20] in this paper the author concentrate on MIL, ODFS, 
AFS, that tracks the target accurately. The weight of each feature of the target is described adaptively based 
on ROI object. Table | shows the different algorithms. 

Xi Chenet al. [21], in this paper the author describes the new visual tracking method based on 
cognitive and particle filter. In this method, six independent eigenvectors by a model called VOCUS2. This 
kind of model used on the appearance of moving object with a same background. The average detection rate 
is 71.8%. 


Table 1. Tracking rate different algorithms 


TRACKING RATE OF DIFFERENT ALGORITHMS (FPS) 


Sequence CT MIL ODFS AFS- VR-V_ Ours 
PETS2001 21 3 20 15 17 yw | 
ScaleCar 22 2 20 15 13 17 
Intelligent Room 40 - 37 29 26 26 
Pedestrian 22 3 17 13 26 18 
Car 48 6 42 29 40 28 
BlurCar 18 2 15 14 15 16 
Average SR 29 3 25 19 23 21 


c) Sec-C. Occlusion model 

Yichun Shi et al. [22] in this paper author propose tracker based on Ensemble-of-Random 
Classifiers (TERC) which tracks the object results in state-of-art. The distribution field tracker (DFT), the 
circulate structure tracker (CSK), the compressive tracker (CT), and the locality-sensitive histogram tracker 
(LSH) which uses the 10 challenging sequences considering an example on girl, tiger etc. Zhaoyun Chen et 
al. [23] in this paper the author proposes the extend STC (spatiotemporal context learning) by exploring 
RGB-D data set. The depth information is introduced with spatial-temporal context model to improve scale 
estimation, and track occlusion and deformation. 

RaedAlmomani et al. [24] in this paper author propose the model called BHAM (Bayesian 
Hierarchical Appearance Model) which detect the partial and full occlusion. The moving object target is 
selected and background subtracted with the segment. The KLTfeature connect between blobs in the 
consecutive frame. Ding Dongsheng et al. [25] this paper proposes a Fusing texture feature model to update 
the target template with low robust color tacking. This method uses the Particle filter algorithm [28] used for 
occlusion. Figure 9 show the test results of PFA. 





(c) Frame 140 (dad) Frame 1850 


Figure 9. Author Proposed particle filter as model tracking algorithm 
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ZHU Su et al. [26] in this paper the author proposed the two model novel robust MFT algorithm for 
saliency mapping. This paper also introduces PF for handing illumination variation, and occlusion. Jin Yuan 
et al. [27] in this paper author propose novel object tracking algorithm on real-time video. The Fast Fourier 
Transform [FFT] is used for the extraction of feature template of object. It also used for extraction of current 
and previous frames. The Figure 10 shows the proposed flow diagram of novel object tracking algorithm on 
real-time video. 








= template 


Track the most 
similar candidate 





(b)Target Locating 


Figure 10. Author Proposed FFT model as tracking algorithm 


d) Sec-D.Object detection model 

Andreas Essl Bastian Leibe et al. [29] in this paper the author proposes the multi-hypotheses 
approach for the detection of the object. The hypotheses use Kalman filters for analysis of the object. The 
object with respect to time over complete set of trajectories is estimated with KF model. Figure 11 shows the 
test results of proposed object tracker of paper. 





Figure 11. Author Proposed Kalman Filter model as tracking algorithm 


Kalisa Wilson et al. [30] in this paper author proposes the Morphological operation and color 
segmentation for the detection of moving the object in real time implementation. It also uses thresholding, 
which used for image processing. Kevin Leahy et al. [31] the paper presented to track an object by using 
Markov Chain model, that is moving among a finite set of states. At each time instant may search one state 
for the target. It is known that searching either of the most likely locations for the target is the optimal 
expectation. 

Shengping Zhang et al. [31] this paper proposes The HMAX model uses Gabor filter for detection 
of the object, where the response of the simple cells was obtained using the second derivative of Gaussian 
filters. The invariance property of the complex cell is found by max pooling operator. Hiroshi Kera et al. [32] 
in this proposed paper the author used the HSV color histograms for obtaining the object property. It also 
uses the RootSIFT Fisher vectors with 64 dimensions for the detection of the object. In video-shot 
segmentation, the median filter with a kernel size of 15 to a sequence of affinities to cope with outliers. 

Yuankai Q et al. [33], in this paper author uses the CNN model for the classification and object 
recognition task. CNN model like R-CNN, VGG-NET, Alex-NET, and Caffe-NET. Based on the VGG-NET 
the deeper architecture on data is obtained. Fig.12. shows the author proposed the main steps for handling the 
object detection using CNN. 
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Figure 12. Author Proposed CNN model as tracking algorithm 


3. VALIDATION 


This paper has presented a comprehensive review of the state-of-the-art on object visual tracking 
with a various algorithm based on two aspects generative and discriminative. The principle behind this is a 
number of trackers are proposed over a year in a different application, but which is the suitable tracker, which 
can be used efficient and accurate, which give high performance to handle the robustness in all conditions. 
By this review, we came to know the difference between different trackers which is used for different 
challenges such as abrupt object motion from frame to frame, appearance change, non-rigid object structures, 
occlusion and illumination with examples. 


Table 1. Challenges with different trackers 


Appearance Illumination 
Model Model 
MILT CS 

Tracker oe on 
MFR/CL WLMC, OIF 
MJDL HAAR, LBP 
IVT MUG 
MIL MIL,ODFS 
CT AFS 
VTS VOCUS2 


CHALLENGES 

Occlusion Model Object detection Model 
TERC, DFT KF 

CSK MORPHOLOGY 
STC HMAX MODEL 
BHAM ROOT SIFT 
KLT R-CNN 

PF VGG-NET 
MFTA 

FT 

CT SW 

PFF 


Table 2. Represents the tabulation of algorithms; focus area (dataset) used, strength and weakness. The above 
review and tabulation to a visual tracking algorithm with different context model are hoped to provide 
beneficial references to researchers and computer vision in a related area 


Ref Algorithm used 
paper 
num 
[4] 
Kalman filter 
[5] Particle filters 
Discrete Kalman 
[6] Filter 


Focus Area 
(Dataset) 


Video surveillance 
Systems 
[Human trackers] 


Human trackers 
Un-manned 
vehicles, 

Robot tanning 
model 

Servo motor 


Strengths 


Kalman filter increases the time 
consistency. 


Works for any observation model 
and any motion model 
Particle filters scale well 


low range view of camera. 
low FPS (Frame per Second). 


Weaknesses 


Deformations and occlusions 
occur on the target is the 
biggest challenge in this 
algorithm. 

Lack of diversity. 


Slower settling time. 
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Ref 


paper 
num 


[3] 


[9] 


[10] 


[11] 


[12] 


[14] 


[15] 


[16] 


[17] 


[18] 


[19] 


[20] 


[21] 


[22] 


[23] 


[24] 


[25] 


Graphical Model for 
Tracking-by- 
Detection 


Algorithm used 


Bayesian Tracking 
Approach 


HOG features 


Compressive 
Tracking 


Multitask Joint 
Dictionary Learning 


Multi-feature joint 
sparse 
Representation 


SimBIL. 


Model-based 
trackers. 


Subspace Based 
Trackers 


Markov Chain 
Monte Carlo 


Balanced 
Co-occurrence 
Features 
Minimum 
Uncertainty Gap 
Estimation 


Timed Motion 
History Image 
[TMHI MODEL] 
With Multi-feature 
Adaptive Fusion 
Visual attention 
system (VOCUS2) 


The ensemble of 
Random Classifiers 
(TERC) 


Spatio-temporal 
context 
Learning(STC) 


Bayesian 
Hierarchical 
Appearance Model 
(BHAM) 


A particle filter 
(Particle, PF) 


Human trackers 


Focus Area 
(Dataset) 


Human 
Animals 
Movie frame 


Used on the bike 
rider 

Human finger. 
Human face 


Human face 


Humans 

Animals 

Books in the library. 
3D model 


Human face 
[Pedestrians] 


Human face 
On currency note 


Dancer 
Pedestrians 
Human tracker 
Sky bird 


Human tracker 


Human 
Football 
Face 
Hand 
Human 
Bird 
Tiger 


Basketball player 


Human 


Human 


Tracking model concentrates on 
accurate and smooth ego-motion 
estimate. 


Strengths 


Bayesian analysis can be more 
robust to outliers, by using more 
flexible distributions 


By using HOG feature extraction 
better results can be achieved on 
edges, cells etc. 

High probability can be achieved 
on dimensional feature and space 
using CT 


For sparse re-presentation depth 
information is provided using 
MJDL model. It cans handle large 
data. 

Captures the frequently emerging 
outlier tasks in the object. 


Refractive index structure 
constant is modelled by speckle 
interaction on a rough surface. 
The off-line tanning process is 
used for the tracker. 


Multiple objects is represented in 
a single frame using subspace 
tracker model. 


High accuracy. 

Large data can be handling. 
Selection of co-occurrence 
patterns makes major advantage 
in RealAda Boost system. 

Highest likelihood score is 
achieved with best state gap 
estimation. 

For the betterment of target 
description, HSV color feature and 
edge orientation feature are used. 


Better descriptive ability. 


By introducing latent variable, the 
classifier learns different 
appearance information, which 
gives accurate output. 

Adopted occlusion detection and 
region growing method, high 
Computing efficiency. 

Can handle full and partial 
occlusion with superior 
performance. 


The accurate illumination changes 
in the tracking of the object are 
achieved by PF 
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Interactions model are used to 
solve two-stage process. 


Weaknesses 


Complex in implement. 


It works on single orientation- 
independent edge presence 
count. 

A single feature is used to 
represent the object. 

Lack of flexibility, Instability of 
appearance model. 

Complex in implementation. 


Complex in implementation. 


SimBIL is a long process and 
time-consuming. 


The limitation of this model in 
which it can tack set of 
objects. 

The larger data set cannot be 
handled in this model as it has 
more variation in appearance. 
Off-line pre-trained data 
require for tracking the object. 
Complex to implement. 
Time-consuming. 

single co-occurrence feature 
achieve lower accuracy 


Failed to track an object in 
many test videos. 


Complex in implementation. 


Average clearly outperforms 


Complex in implementation. 


Weak in robust object 
location. 


Weak in multiple object 
tracking and deformable 
objects tracking. 


By using color PF in the 
tracking of an object, it is more 
immune to illumination. 
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saliency-based target Human This tracker handles illumination, Computation is more. 
[26] descriptor clutter, similar background and 
occlusion very accurately. 
Ref Algorithm used Focus Area Strengths Weaknesses 
paper (Dataset) 
num 
Online learning CAR For the detection of object, The only occlusion is detected. 
[27] Algorithm. David indoor Coarse-to-fine sliding window 
Bolt search algorithm is used. 
Coke 
RGBD Trackers Human For the detection of object This tracker is more sensitive 
[28] 3D part-based sparse exploring part-by-part spatial to synchronization and 
tracker encoder are used. registration noise. 
Failure prevention, Aerial images It can handle different challenges. Less speed and performance 
[29] detection, and 
recovery 
mechanisms 
R-CNN Dataset of personal _Pre-training achieved effective More data set is required for 
[30] features interaction with 29- _ tracking of the object. the improvement of feature 
sequences is used. description and object- 
candidate generation. 
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